home *** CD-ROM | disk | FTP | other *** search
Text File | 1994-09-01 | 45.9 KB | 1,058 lines |
- CHAPTER 9
-
- PROGRAM OPTIMIZATION
-
- Throughout the preceding chapters I have shown a variety of tips and
- techniques that can help to improve the efficiency of your programs. For
- example, Chapter 6 explained that processing files in large pieces reduces
- the time needed to save and load data. Likewise, Chapter 8 discussed the
- improvement that SWAP often provides over conventional assignments. Some
- optimizations, however, do not fit into any of the well-defined categories
- that have been used to organize this book. In this chapter I will share
- several general optimization techniques you can employ to reduce the size
- of your programs and make them run faster.
- The material in this chapter is organized into three principle
- categories: programming shortcuts and speed improvements, miscellaneous
- tips and techniques, and benchmarking. Each section addresses BASIC
- programming ideas and methods that are not immediately obvious in most
- cases.
-
-
- PROGRAMMING SHORTCUTS AND SPEED IMPROVEMENTS
- ============================================
-
- Chapter 3 discussed the use of AND, OR, and the other logical operations
- that can be used for both logical (IF and CASE) tests and also bit
- operations. But there are a few other related points that are worth
- mentioning here. When you need to know if a variable is zero or not, you
- can omit an explicit test for zero like this:
-
- IF Variable THEN...
-
- You might be tempted to think that two variables could be tested for non-
- zero values at one time in the same way, using code such as this:
-
- IF Var1 AND Var2 THEN...
-
- However, that will very likely fail. The expression Var1 AND Var2 combines
- the bits in these variables, which could result in a value of zero even
- when both variables are non-zero. As an example, if Var1 currently holds
- a value of 1, its bits will be set as follows:
-
- 0000 0000 0000 0001
-
- Now, if Var2 is assigned the value 2, its bits will be set like this:
-
- 0000 0000 0000 0010
-
- Since no two bits are set in the same position in each variable, the result
- of Var1 AND Var2 is zero. An effective solution is IF Var1 * Var2 THEN to
- ensure that neither variable is zero. And to test if either variable is
- non-zero you'd use OR. Whatever follows the test IF Var1 OR Var2 THEN will
- be executed as long as one (or both) variables are not zero. These are
- important short cuts to understand, because the improvement in code size
- and execution speed can be significant.
- Each of the AND, OR, and multiplication tests shown here generates only
- 11 bytes of code. Contrast that to the 28 bytes that BC creates for the
- alternative: IF Var1 <> 0 AND Var2 <> 0 THEN. Because of the improved
- method of expression evaluation in BASIC PDS, this last example generates
- only 14 bytes when using that version of BC. None the less, if you can
- avoid explicit comparisons to zero you will go a long way toward improving
- the efficiency of your code.
- This short cut is equally appropriate with LOOP comparisons as well as
- IF tests. In the BufIn function shown in Chapter 6, INSTR was used to see
- if a CHR$(13) carriage return was present in the buffer. In the statement
- CR = INSTR(BufPos, Buffer$, CR$), CR receives either 0 if that character
- is present, or a non-zero position in the string where it was found. The
- LOOP statement that surrounded the buffer searching uses LOOP WHILE CR,
- which continues looping as long as CR is not zero.
- When an integer variable is compared in a LOOP WHILE condition, seven
- bytes of code are generated whether it is compared to zero or not. But
- when a long integer is used to control the LOOP WHILE condition, omitting
- the explicit test for zero results in 11 bytes of compiled code where
- including it creates 20 bytes. Note that with floating point values
- identical code is generated in either case, because an explicit comparison
- to zero is required and added by the compiler.
-
-
- PREDEFINING VARIABLES
-
- Another important point is illustrated in the same code fragment that uses
- INSTR to search for a carriage return. There, the CR$ string variable had
- been assigned earlier to CHR$(13). Although the BufIn code could have used
- CR = INSTR(BufPos, Buffer$, CHR$(13)) instead of a previously defined
- string variable to replace the CHR$(13), that would take longer each time
- the statement is executed. Since CHR$ is a function, it must be called
- each time it is used. If CR$ is defined once ahead of time, only its
- address needs to be passed to INSTR. This can be done with four bytes of
- assembly language code.
- If CHR$(13) will be used only once in a program, then the only savings
- afforded by predefining it will be execution speed. But when it is needed
- two or more times, several bytes can be saved at each occurrence by using
- a replacement string variable. Other common CHR$ values that are used in
- BASIC programs are the CHR$(34) quote character, and CHR$(0) which is often
- used when accessing DOS services with CALL Interrupt.
- Likewise, you should avoid calling any functions more than is absolutely
- necessary. I have seen many programmers use code similar to the following,
- to see if a drive letter has been given as part of a file name.
-
- IF INSTR(Path$, ":") THEN
- Drive$ = LEFT$(INSTR(Path$, ":") - 1)
- END IF
-
- A much better approach is to invoke INSTR only once, and save the results
- for subsequent testing:
-
- Found% = INSTR(Path$, ":") 'save the result from INSTR
- IF Found% THEN
- Drive$ = LEFT$(Path$, Found%) - 1)
- END IF
-
- The same situation holds true for UCASE$, MID$, and all of the other BASIC
- functions. Rather than this:
-
- IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/A") THEN A = True
- IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/B") THEN B = True
- IF INSTR(UCASE$(MID$(Work$, 3, 22)), "/C") THEN C = True
-
- Instead use this:
-
- Temp$ = UCASE$(MID$(Work$, 3, 22))
- IF INSTR(Temp$, "/A") THEN A = True
- IF INSTR(Temp$, "/B") THEN B = True
- IF INSTR(Temp$, "/C") THEN C = True
-
- Where the first example generates 138 bytes of code, the second uses only
- 111. The time savings will be even more significant, because BASIC's
- UCASE$ and MID$ functions allocate and deallocate memory by making further
- calls to BASIC's string memory management routines.
- Indeed, it is always best to avoid creating new strings whenever
- possible, precisely because of the overhead needed to assign and erase
- string data. Each time a string is assigned, memory must be found to hold
- it; add to that the additional code needed to release the older, abandoned
- version of the string.
- This has further ramifications with simple string tests as well. As
- Chapter 3 explained, testing for single characters or the first character
- in a string is always faster if you isolate the ASCII value of the
- character first, and then use integer comparisons later. In the example
- below, the first series of IF tests generates 60 bytes of code. This is
- much less efficient than the second which generates only 46, even though
- the steps to obtain and assign the ASCII value of Answer$ comprise 12 of
- those bytes.
-
- PRINT "Abort, Retry, or Fail? (A/R/F) ";
- DO
- Answer$ = UCASE$(INKEY$)
- LOOP UNTIL LEN(Answer$)
-
- '----- Method 1:
- IF Answer$ = "A" THEN
- REM
- ELSEIF Answer$ = "R" THEN
- REM
- ELSEIF Answer$ = "F" THEN
- REM
- END IF
-
-
- '----- Method 2:
- A% = ASC(Answer$)
- IF A% = 65 THEN
- REM
- ELSEIF A% = 82 THEN
- REM
- ELSEIF A% = 70 THEN
- REM
- END IF
-
- Another prime candidate for speed enhancement is when you need to create
- a string from individual characters. The first example below reads the 80
- characters in the top row of display memory, and builds a new string from
- those characters.
-
- Scrn$ = ""
- FOR X = 1 TO 80
- Scrn$ = Scrn$ + CHR$(SCREEN(1, X))
- NEXT
-
- Since we already know that 80 characters are to be read, a much better
- method is to preassign the destination string, and insert the characters
- using the statement form of MID$, thus:
-
- Scrn$ = SPACE$(80)
- FOR X% = 1 TO 80
- MID$(Scrn$, X%, 1) = CHR$(SCREEN(1, X%))
- NEXT
-
- An informal timing test that executed these code fragments 100 times using
- QuickBASIC 4.5 showed that the second example is nearly twice as fast as
- the first. Moreover, since BASIC's SCREEN function is notoriously slow,
- the actual difference between building a new string and inserting
- characters into an existing string is no doubt much greater.
-
-
- INTEGER AND LONG INTEGER ASSIGNMENTS
-
- Another facet of compiled BASIC that is probably not immediately obvious
- is the way that integer and long integer assignments are handled by the
- compiler. When many variables are to be assigned the same value--perhaps
- cleared to zero--it is often more efficient to assign one of them from that
- value, and then assign the rest from the first. To appreciate why this is
- so requires an understanding of how BASIC compiles such assignments.
- Normally, assigning an integer or long integer variable from a numeric
- constant requires the same amount of code as assigning from another
- variable. The BASIC statement X% = 1234 is compiled to the following 6-
- byte assembly language statement.
-
- C7063600D204 MOV WORD PTR [X%],1234
-
- Assigning the long integer variable Y& requires two such 6-byte
- instructions--one for the low word and another for the high word:
-
- C7063600D204 MOV WORD PTR [Y&],1234 ;assign the low word
- C70638000000 MOV WORD PTR [Y&+2],0 ;then the high word
-
- The 80x86 family of microprocessors does not have direct instructions for
- moving the contents of one memory location to another. Therefore, the
- statement X% = Y% is compiled as follows, with the AX register used as an
- intermediary.
-
- A13800 MOV AX,WORD PTR [Y%] ;move Y% into AX
- A33600 MOV WORD PTR [X%],AX ;move AX into X%
-
- Assigning one long integer from another as in X& = Y& is handled similarly:
-
- A13A00 MOV AX,WORD PTR [Y&] ;move AX from Y& low
- 8B163C00 MOV DX,WORD PTR [Y&+2] ;move DX from Y& high
- A33600 MOV WORD PTR [X&],AX ;move X& low from AX
- 89163800 MOV WORD PTR [X&+2],DX ;move X& high from DX
-
- You may have noticed that instructions that use the AX registers require
- only three bytes to access a word of memory, while those that use DX (or
- indeed, any register other than AX) require four. But don't be so quick
- to assume that BASIC is not optimizing your code. The advantage to using
- separate registers is that the full value of Y& is preserved. Had AX been
- used both times, the low word would be lost when the high word was
- transferred from Y& to X&.
- When assigning one variable to many in a row, BASIC is smart enough to
- remember which values are in which registers, and it reuses those values
- for subsequent assignments. The combination BASIC and assembly language
- code shown below was captured from a CodeView session and edited slightly
- for clarity. It shows the actual assembly language code bytes generated
- for a series of assignments.
-
- Plain integer assignments:
-
- A% = 1234
- C7063600D204 MOV WORD PTR [A%],&H04D2
- B% = 1234
- C7063800D204 MOV WORD PTR [B%],&H04D2
- C% = 1234
- C7063A00D204 MOV WORD PTR [C%],&H04D2
- D% = 1234
- C7063C00D204 MOV WORD PTR [D%],&H04D2
- E% = 1234
- C7063E00D204 MOV WORD PTR [E%],&H04D2
-
-
- Plain long integer assignments:
-
- V& = 1234
- C7064000D204 MOV WORD PTR [V&],&H04D2
- C70642000000 MOV WORD PTR [V&+2],0
- W& = 1234
- C7064400D204 MOV WORD PTR [W&],&H04D2
- C70642000000 MOV WORD PTR [W&+2],0
- X& = 1234
- C7064800D204 MOV WORD PTR [X&],&H04D2
- C70642000000 MOV WORD PTR [X&+2],0
- Y& = 1234
- C7064C00D204 MOV WORD PTR [Y&],&H04D2
- C70642000000 MOV WORD PTR [Y&+2],0
- Z& = 1234
- C7065000D204 MOV WORD PTR [Z&],&H04D2
- C70642000000 MOV WORD PTR [Z&+2],0
-
-
- Assigning multiple integers from another:
-
- A% = 1234
- C7063600D204 MOV WORD PTR [A%],&H04D2
- B% = A%
- A13600 MOV AX,WORD PTR [A%]
- A33800 MOV WORD PTR [B%],AX
- C% = A%
- A33A00 MOV WORD PTR [C%],AX
- D% = A%
- A33C00 MOV WORD PTR [D%],AX
- E% = A%
- A33E00 MOV WORD PTR [E%],AX
-
-
- Assigning multiple long integers from another:
-
- V& = 1234
- C7064000D204 MOV WORD PTR [V&],&H04D2
- C70642000000 MOV WORD PTR [V&+2],0
- W& = V&
- A14000 MOV AX,WORD PTR [V&]
- 8B164200 MOV DX,WORD PTR [V&+2]
- A34400 MOV WORD PTR [W&],AX
- 89164600 MOV WORD PTR [W&+2],DX
- X& = V&
- A34800 MOV WORD PTR [X&],AX
- 89164A00 MOV WORD PTR [X&+2],DX
- Y& = V&
- A34C00 MOV WORD PTR [Y&],AX
- 89164E00 MOV WORD PTR [Y&+2],DX
- Z& = V&
- A35000 MOV WORD PTR [Z&],AX
- 89165200 MOV WORD PTR [Z&+2],DX
-
-
- The first five statements assign the value 1234 (04D2 Hex) to integer
- variables, and each requires six bytes of code. The next five instructions
- assign the same value to long integers, taking two such instructions for
- a total of 12 bytes for each assignment. Note that a zero is assigned to
- the higher word of each long integer, because the full Hex value being
- assigned is actually &H000004D2. Simple multiplication shows that the five
- integer assignments generates five times six bytes, for a total of 30
- bytes. The long integer assignments take twice that at 60 bytes total.
- But notice the difference in the next two statement blocks. The first
- integer assignment requires the usual six bytes, and the second does as
- well. But thereafter, any number of additional integer variables will be
- assigned with only three bytes apiece. Likewise, all but the first two
- long integer assignments are implemented using only seven bytes each.
- Remembering what values are in each register is yet one more optimization
- that BASIC performs as it compiles your program.
-
-
- SHORT CIRCUIT EXPRESSION EVALUATION
-
- Many programming situations require more than one test to determine if a
- series of instructions are to be executed or a branch taken. The short
- example below tests that a string is not null, and also that the row and
- column to print at are legal.
-
- IF Work$ <> "" AND Row <= 25 AND Column <= 80 THEN
- LOCATE Row, Column
- PRINT Work$
- END IF
-
- When this program is compiled with QuickBASIC, all three of the tests are
- first performed in sequence, and the results are then combined to see if
- the LOCATE and PRINT should be performed. The problem is that time is
- wasted comparing the row and column even if the string is null. When speed
- is the primary concern, you should test first for the condition that is
- most likely to fail, and then use a separate test for the other conditions:
-
- IF Work$ <> "" THEN
- IF Row <= 25 AND Column <= 80 THEN
- LOCATE Row, Column
- PRINT Work$
- END IF
- END IF
-
- This separation of tests is called *short circuit expression evaluation*,
- because you are bypassing--or short circuiting--the remaining tests when
- the first fails. Although it doesn't really take BASIC very long to
- determine if a string is null, the principle can be applied to other
- situations such as those that involve file operations like EOF and LOF.
- Further, as you learned in Chapter 3, a better way to test for a non-null
- string is IF LEN(Work$) THEN. However, the point is to perform those tests
- that are most likely to fail first, before others that are less likely or
- will take longer.
- Another place where you will find it useful to separate multiple tests
- is when accessing arrays. If you are testing both for a legal element
- number and a particular element value, QuickBASIC will give a "Subscript
- out of range" error if the element number is not valid. This is shown
- below.
-
- IF Element <= MaxEls AND Array(Element) <> 0 THEN
-
- Since QuickBASIC always performs both tests, the second will cause an error
- if Element is not a legal value. In this case, you *have* to implement the
- tests using two separate statements:
-
- IF Element <= MaxEls THEN
- IF Array(Element) <> 0 THEN
- .
- .
- END IF
- END IF
-
- You may have noticed the I have referred to QuickBASIC here exclusively
- in this discussion. Beginning with BASIC 7.0, Microsoft has added short
- circuit testing to the compiler as part of its built-in decision making
- process. Therefore, when you have a statement such as this one:
-
- IF X > 1 AND Y = 2 AND Z < 3 THEN
-
- BASIC PDS substitutes the following logic automatically:
-
- IF X <= 1 THEN GOTO SkipIt
- IF Y <> 2 THEN GOTO SkipIt
- IF Z >= 3 THEN GOTO SkipIt
- .
- .
- SkipIt:
-
- Speaking of THEN and GOTO, it is worth mentioning that the keyword THEN
- is not truly necessary when the only thing that follows is a GOTO. That
- is, IF X < 1 GOTO Label is perfectly legal, although the only savings is
- in the program's source code.
- This next and final trick isn't technically a short circuit expression
- test, but it can reduce the size of your programs in a similar fashion.
- Chapter 3 compared the relative advantages of GOSUB routines and called
- subprograms, and showed that a subprogram is superior when passing
- parameters, while a GOSUB is much faster and smaller. An ideal compromise
- in some situations is to combine the two methods.
- If you have a called subprogram (or function) that requires a large
- number of parameters and it is called many times, you can use a single call
- within a GOSUB routine. Since a GOSUB statement generates only three bytes
- of code each time it is used, this can be an ideal way to minimize the
- number of times that the full CALL is required. Of course, GOSUB does not
- accept parameters, but many of them may be the same from call to call.
- In particular, some third-party add-on libraries require a long series of
- arguments that are unlikely to change. This is shown below.
-
-
- Row = 10
- Column = 20
- Message$ = "Slap me five"
- GOSUB DisplayMsg
- .
- .
- DisplayMsg:
- CALL ManyParams(Row, Column, Message$, MonType, NoSnow, FGColr, BGColr, _
- HighlightFlag, VideoMode, VideoPage)
- RETURN
-
-
- In many cases you would have assigned permanent values for the majority
- of these parameters, and it is wasteful to have BASIC create code to pass
- them repeatedly. Here, the small added overhead of the three assignments
- prior to each GOSUB results in less code than passing all ten arguments
- repeatedly.
-
-
- MISCELLANEOUS TIPS AND TECHNIQUES
- =================================
-
- There are many tricks that programmers learn over the years, and the
- following are some of the more useful ones I have developed myself, or come
- across in magazines and other sources.
-
-
- FORMATTING AND ROUNDING
-
- One frequent requirement in many programs is having control over how
- numbers are formatted. Of course, BASIC has the PRINT USING statement
- which is adequate in most cases. And Chapter 6 also showed how to trick
- BASIC's file handling statements into letting you access a string formatted
- by PRINT USING. But there are other formatting issues that are not handled
- by BASIC directly.
- One problem for many programmers is that BASIC adds leading and trailing
- blanks when printing numbers on the screen or to a disk file. The leading
- blank is a placeholder for a possible minus sign, and is not added when the
- number is in fact negative. Avoiding the trailing blank is easy; simply
- use PRINT STR$(Number). And the easiest way to omit the leading blank for
- positive numbers is to use LTRIM$: PRINT LTRIM$(STR$(Number)).
- PRINT USING is notoriously slow, because examines each character in a
- string version of the number, and reformat the digits while interpreting
- the many possible options specified in a separate formatting string. But
- in many cases all that is really needed is simple right justification. To
- right-align an integer value (or series of values) you can use RSET to
- assign the numbers into a string, and then print that string as shown
- below.
-
- Work$ = SPACE$(10)
- REST Work$ = STR$(Number)
- PRINT TAB(15); Work$
-
- In this case, Work$ could also have been dimensioned as a fixed-length
- string. Adding leading zeros to a number is also quite easy using RIGHT$
- like this:
-
- PRINT RIGHT$("00000" + LTRIM$(STR$(Number)), 6)
-
- You will need at least as many zeros in the string as the final result
- requires, less one since STR$ always returns at least one digit. Trailing
- digits are handled similarly, except you would use LEFT$ instead of RIGHT$.
- Rounding numbers is an equally common need, and there are several ways
- to handle this. Of course, INT and FIX can be used to truncate a floating
- point value to an integer result, but neither of these perform rounding.
- For that you should use CINT or CLNG, which do round the number to the
- closest integer value. For example, Value = CINT(3.59) will assign 4 to
- Value, regardless of whether Value is an integer, single precision, or
- whatever.
- Some BASICs have a CEIL function, which returns the next *higher* integer
- result. That is, CEIL(3) is 3, but CEIL(3.01) returns the value 4. This
- function can be easily simulated using Ceil = -INT(-Number).
- Rounding algorithms are not quite so simple to implement, as you can see
- in the short DEF FN function below.
-
- DEF FnRound# (Value#, Digits%)
- Mult% = 10 ^ Digits%
- FnRound# = FIX((Mult% * Value#) + (SGN(Value#)) * .5#) / Mult%
- END DEF
-
- Another important math optimization is to avoid exponentiation whenever
- possible. Whether you are using integers or floating point numbers, using
- Number ^ 2 and Number ^ 3 are many times slower than Number * Number and
- Number * Number * Number respectively.
-
-
- STRING TRICKS AND MINIMIZING PARAMETERS
-
- There are a few string tricks and issues worth mentioning here too. The
- fastest and smallest way to clear a string without actually deleting it is
- with LSET Work$ = "". Another clever and interesting string trick lets you
- delete a string with only nine bytes of code, instead of the usual 13.
- In Chapter 6 you learned that the assembly language routines within
- BASIC's runtime library are accessible if you know their names. You can
- exploit that by using the B$STDL (string delete) routine, which requires
- less code to set up and call than the more usual Work$ = "". When a string
- is assigned to a null value, two parameters--the address of the target
- string and the address of the null--are passed to the string assignment
- routine. But B$STDL needs only the address of the string being deleted.
- You might think that BASIC would be smart enough to see the "" null and
- call B$STDL automatically, but it doesn't. Here is how you would declare
- and call B$STDL:
-
- DECLARE SUB DeleteStr ALIAS "B$STDL" (Work$)
- CALL DeleteStr(Any$)
-
- As with the examples that let you call GET # and PUT # directly, DeleteStr
- will not work in the QB environment unless you first create a wrapper
- subprogram written in BASIC, and include that wrapper in a Quick Library.
- And this brings up an important point. Why bother to write a BASIC
- subprogram that in turn calls an internal routine, when the BASIC
- subprogram could just as easily delete the string itself? Therefore, the
- best solution--especially because it's also the easiest--is to write
- DeleteStr in BASIC thus:
-
- SUB DeleteStr(Work$)
- Work$ = ""
- END SUB
-
- This is an important concept to be sure, because it shows how to reduce
- the number of parameters when a particular service is needed many times.
- Other similar situations are not hard to envision, whereby multiple
- parameters that do not change from call to call can be placed into a
- subprogram that itself requires only one or two arguments.
- This technique can be extended to several BASIC statements that use more
- parameters than might otherwise be apparent. For example, whenever you use
- LOCATE, additional hidden parameters are passed to the B$LOCT routine
- beyond those you specify. The statement LOCATE X, Y generates 22 bytes of
- code, even though other called routines that take two parameters need only
- 13. (Every passed parameter generates four bytes of code, and the actual
- CALL adds five more. This is the same whether the routine being called is
- an internal BASIC statement, a BASIC subprogram or function, or an assembly
- language routine.) Therefore, if you use LOCATE with two arguments
- frequently in a program, you can save nine bytes for each by creating a
- BASIC subprogram that performs the LOCATE:
-
- SUB LocateIt(Row, Column) STATIC
- LOCATE Row, Column
- END SUB
-
- Similarly, if you frequently turn the cursor on and off, you should create
- two subprograms--perhaps called CursorOn and CursorOff--that invoke LOCATE.
- Since no parameters are required, the savings will add up quickly. Calling
- either of the subprograms below generates only five bytes of code, as
- opposed to 18 for the statement LOCATE , , 1 and 20 for LOCATE , , 0.
-
- SUB CursorOn STATIC
- LOCATE , , 1
- END SUB
-
- SUB CursorOff STATIC
- LOCATE , , 0
- END SUB
-
- The COLOR statement also requires more parameters than the number of
- arguments you give. Where COLOR FG, BG generates 22 bytes of compiled
- code, CALL ColorIt(FG, BG) creates only 13. CLOSE is yet another BASIC
- statement that accepts multiple arguments, and it too requires hidden
- parameters. Using CLOSE #X compiles to 13 bytes, and CALL CloseIt(X) is
- only nine.
- The reason that BASIC sends more parameters than you specify is because
- these routines need extra information to know which and how many arguments
- were given. In the case of LOCATE, each argument is preceded with a flag
- that tells if the next one was given. CLOSE is similar, except the last
- parameter tells how many file numbers were specified. Remember, you can
- use CLOSE alone to close all open files, or CLOSE 1, 3, 4 to close only
- those files numbers. Therefore, BASIC requires some way to tell the CLOSE
- statement how many file numbers there are.
- Another place where several statements can be consolidated within a
- single procedure is when peeking and poking memory. BASIC's PEEK and POKE
- are limited because they can access only one byte in memory at a time. But
- many useful memory locations are in fact organized as a pair of bytes, as
- you will see in Chapter 10. Instead of using code to combine or separate
- the bytes each time memory is accessed, you can use the following short
- routines that let you peek and poke two bytes at once.
-
-
- DECLARE FUNCTION PeekWord%(Address%)
- PeekWord% = PEEK(Address%) + 256 * PEEK(Address% + 1)
- END FUNCTION
-
- DECLARE SUB PokeWord(Address%, Value%)
- POKE Address%, Value% AND 255
- POKE Address% + 1, Value% \ 256
- END SUB
-
-
- Because these routines use BASIC's PEEK and POKE, you still need to use DEF
- SEG separately. Of course, the segment could be added as another
- parameter, and assigned within the routines:
-
-
- DECLARE FUNCTION PeekWord%(Segment%, Address%)
- DEF SEG = Segment%
- PeekWord% = PEEK(Address%) + 256 * PEEK(Address% + 1)
- END FUNCTION
-
-
- WORD WRAPPING
-
- A string handling technique you will surely find useful is implementing
- word wrapping. There are a number of ways to do this, and the following
- code shows one that I have found to be very efficient.
-
- DEFINT A-Z
- SUB WordWrap (X$, Wide, LeftMargin)
-
- Length = LEN(X$) 'remember the length
- Pointer = 1 'start at the beginning of the string
- IF LeftMargin = 0 THEN LeftMargin = 1
-
- 'Scan a block of Wide characters backwards, looking for a blank. Stop
- ' at the first blank, or upon reaching the beginning of the string.
- DO
- FOR X = Pointer + Wide TO Pointer STEP -1
- IF MID$(X$, X, 1) = " " OR X = Length + 1 THEN
- LOCATE , LeftMargin
- PRINT MID$(X$, Pointer, X - Pointer);
- Pointer = X + 1
- WHILE MID$(X$, Pointer, 1) = " "
- Pointer = Pointer + 1
- WEND
- IF POS(0) > 1 THEN PRINT
- EXIT FOR
- END IF
- NEXT
- LOOP WHILE Pointer < Length
- END SUB
-
- The WordWrap subprogram expects the text for display to be in a single
- long string. You pass it that text, a left margin, and a width. You could
- certainly add enhancements to this routine such as a color parameter, or
- the ability to format the text and send it to a printer or disk file.
-
-
- UNUSUAL WAYS TO ACCESS DISPLAY MEMORY
-
- If you ever tried to print a character in the lower-right corner of the
- display screen, you probably discovered that it cannot be done [with many
- BASIC versions] without causing the screen to scroll up. The only solution
- I am aware of is to use POKE to assign the character (and optionally its
- color) to display memory directly as shown below.
-
- DEF SEG = &HB800 'use &HB000 for a monochrome display
- POKE 3998, 65 'ASCII code for the letter "A"
- POKE 3999, 9 'bright blue on black
-
- The second trick also uses display memory in an unconventional manner.
- All video adapters contain at least 4096 bytes of on-board memory. Even
- though a 25 line by 80 column text mode screen uses only 4000 bytes (2000
- characters plus 2000 colors), memory chips are built in multiples of 1,024
- bytes. Therefore, you can use the last 96 bytes on the display adapter
- in your programs. If the adapter supports multiple video pages, then you
- can use the last 96 bytes in each 25-line page.
- One use for this memory is to provide a way to communicate small amounts
- of information between separate programs. When you don't want to structure
- an application to use CHAIN, the only other recourse is to use a disk file
- to pass information between the programs. But if all that is needed is a
- file name or drive letter, using a file can be awkward and slow, especially
- if the program is running from a floppy disk.
- One way to access this video memory is with PEEK and POKE. But PEEK and
- POKE are awkward too, and can access only one byte at a time. A better
- approach is to use an assembly language routine to copy one contiguous
- memory block to another location. The MemCopy routine below is designed
- to do exactly this.
-
- ;MEMCOPY.ASM, copies a block of memory from here to there
-
- .Model Medium, Basic
- .Code
-
- MemCopy Proc Uses DS ES SI DI, FromAdr:DWord, ToAdr:DWord, NumBytes:Word
-
- Cld ;copy in the forward direction
-
- Mov SI,NumBytes ;get the address for NumBytes%
- Mov CX,[SI] ;put it into CX for copying below
-
- Les DI,FromAdr ;load ES:DI with the source address
- Lds SI,ToAdr ;load DS:SI with destination address
-
- Shr CX,1 ;copy words instead of bytes for speed
- Rep Movsw ;do the copy
- Adc CX,CX ;this will set CX to either 0 or 1
- Rep Movsb ;copy the odd byte if necessary
-
- Ret ;return to BASIC
-
- MemCopy Endp
- End
-
- MemCopy may be declared and called in two different ways. The first uses
- SEG and is most appropriate when you are copying data between variables,
- for example from a group of elements in one array to elements in another.
- The second lets you specify any arbitrary segment and address, and it
- requires the BYVAL modifier either in the DECLARE statement, the CALL, or
- both. Each method is shown below.
-
- DECLARE SUB MemCopy(SEG AnyVar1, SEG AnyVar2, Numbytes%)
- CALL MemCopy(AnyVar1, AnyVar2, NumBytes%)
-
- DECLARE SUB MemCopy(BYVAL Seg1%, BYVAL Adr1%, BYVAL Seg2%, _
- BYVAL Adr2%, NumBytes%)
- CALL MemCopy(SourceSeg%, SourceAdr%, DestSeg%, DestAdr%, NumBytes%)
-
- You may also use a combination of these, perhaps with SEG for the source
- argument and BYVAL for the second. For example, to copy a 20-byte TYPE
- variable to the area just past the end of video memory on a color display
- adapter you would do this:
-
- CALL MemCopy(SEG TypeVar, BYVAL &HB800, BYVAL 4000, 20)
-
- In many cases you may need to use MemCopy in more than one way in the same
- program. For this reason it is probably better not to declare it at all.
- Once a subprogram or function has been declared, BASIC will refuse to let
- you change the number or type of parameters. But if you don't include a
- declaration at all, you are free to use any combination of SEG and BYVAL,
- and also any type of variable.
- It is important to understand that numeric and TYPE variables should be
- specified using SEG, so MemCopy will know the full address where the
- variable resides. You could use a combination of BYVAL VARSEG(Variable)
- and BYVAL VARPTR(Variable), but that is not quite as efficient as SEG.
- Copying to or from a conventional string using QuickBASIC requires SADD
- (string address) instead of VARPTR; far strings in BASIC 7 require SADD,
- and also SSEG (string segment) instead of VARSEG.
-
-
- REBOOTING A PC
-
- Another simple trick that is not obvious to many programmers is how to
- reboot a PC. Although most PC technical reference manuals show an
- interrupt service for rebooting, that simply does not work with most
- computers. However, every PC has a BIOS routine that is at a fixed
- address, and which may be called directly like this:
-
- DEF SEG = &HFFFF
- CALL Absolute(0)
-
- The Absolute routine is included in thee QB and QBX libraries that come with
- BASIC. If a cold boot with the full memory test and its attendant delay
- is acceptable, then the code shown above is all that you need. Otherwise,
- you must poke the special value &H1234 in low memory as a flag to the BIOS
- routine, so it will know that you want a warm boot instead:
-
- DEF SEG = 0
- POKE &H473, &H12
- POKE &H472, &H34
- DEF SEG = &HFFFF
- CALL Absolute(0)
-
-
- INTEGER VALUES GREATER THAN 32K
-
- As you learned in Chapter 2, an integer variable can hold any value between
- -32768 and 32767. When this range of numbers is considered, the integer
- is referred to as being a signed number. But the same range of values can
- also be treated as unsigned numbers spanning from 0 through 65535. Since
- BASIC does not support unsigned integers, additional trickery is often
- needed to pass values between 32768 and 65535 to assembler routines and DOS
- and BIOS services you invoke with CALL Interrupt. One way to do this is
- to use a long integer first, and add an explicit test for values higher
- than 32767:
-
- Temp& = NumBytes&
- IF Temp& > 32767 THEN
- IntBytes% = Temp& - 65536
- ELSE
- IntBytes% = Temp&
- END IF
-
- To reverse the process you would test for a negative value:
-
- IF IntBytes% < 0 THEN
- NumBytes& = IntBytes% + 65536
- ELSE
- NumBytes& = IntBytes%
- END IF
-
- Although this method certainly works, it is inefficient because of the
- added IF testing. When you merely need to pass a variable to a called
- routine, you can skip this testing and simply pass the long integer
- directly. This may appear counter to the rule that you must always pass
- the same type of variable that a subroutine expects. But as long as the
- arguments are not being passed by value using BYVAL, this method works and
- adds no extra code.
- When a parameter is passed to a subprogram or function, BASIC sends the
- address of its first byte as shown in Figure 9-1.
-
-
- ─ ─ ─ ┬────┬────┬────┬────┬ ─ ─ ─
- │ B1 │ B2 │ B3 │ B4 │
- ─ ─ ─ ┴────┴────┴────┴────┴ ─ ─ ─
- ^
- └───────── Address passed to the routine
-
- Figure 9-1: Passing a long integer where a regular integer is expected.
-
-
- Here, B1, B2, and so forth refer to the Bytes 1 through 4 of a long integer
- variable. Since the assembly language routine is expecting a regular
- integer, it looks at just the first two bytes of the variable. Thus, a
- long integer can be used even when a conventional integer is expected. Of
- course, any excess greater than 65535 will be ignored by the routine, since
- the bits that hold the excess are in the third and fourth bytes.
-
-
- BENCHMARKING
- ============
-
- Throughout this book I have emphasized the importance of writing code that
- is as small and fast as possible. And these goals should be obvious to all
- but the most novice programmer. But it is not always obvious how to
- determine for yourself which of several approaches yields code that is the
- smallest or fastest. One way is to use Microsoft CodeView, which lets you
- count the bytes of assembler code that are generated. This is how I
- obtained the byte counts stated throughout this book.
- But smaller is not always faster. Further, the code that BASIC
- generates is not the whole story. In many cases BASIC makes calls to its
- runtime library routines, and you would have to trace through those as well
- to know the total byte count for a given statement. It is not impossible
- to trace through the BASIC runtime using CodeView, but it certainly can be
- tedious. Many of BASIC's internal routines are very convoluted--especially
- those that allocate and deallocate string and other memory. Often it is
- simpler to devise a test that executes a series of statements many times,
- and then time how long the test took.
- As an example for this discussion, I will compare two different ways to
- print three strings in succession and show how to tell which produces less
- code, and which is faster. The first statement below prints each string
- separately, and the second combines the strings and then prints them as
- one.
-
- 1: PRINT X$; Y$; Z$
- 2: PRINT X$ + Y$ + Z$
-
- Since the length of each string will certainly influence how long it takes
- to print them, each of the strings is first initialized to 80 characters
- as follows:
-
- X$ = STRING$(80, "X")
- Y$ = STRING$(80, "Y")
- Z$ = STRING$(80, "Z")
-
- It is important to understand that the PRINT statement itself will be a
- factor, since it takes a certain amount of time to copy the characters
- from each string to display memory. Worse, if the screen needs to be
- scrolled because the text runs past the bottom of the display, that will
- take additional time. To avoid the overhead of scrolling, the test program
- uses LOCATE to start each new print statement at the top of the screen.
- Of course, using LOCATE adds further to the overhead, but in this case much
- less than scrolling would. To prove this to yourself, disable the line
- that contains the LOCATE statement. Here's the complete benchmark program:
-
- CLS
- X$ = STRING$(80, "X") 'create the test string
- Y$ = STRING$(80, "Y")
- Z$ = STRING$(80, "Z")
-
- Synch! = TIMER 'synchronize to TIMER
- DO
- Start! = TIMER
- LOOP WHILE Start! = Synch!
-
- FOR X = 1 TO 1000 '1000 times is adequate
- LOCATE 1
- PRINT X$; Y$; Z$
- NEXT
-
- Done! = TIMER 'calculate elapsed time
- Test1! = Done! - Start!
-
- Synch! = TIMER 'as above
- DO
- Start! = TIMER
- LOOP WHILE Start! = Synch!
-
- FOR X = 1 TO 1000
- LOCATE 1
- PRINT X$ + Y$ + Z$
- NEXT
-
- Done! = TIMER
- Test2! = Done! - Start!
-
- PRINT USING "##.## seconds using three strings"; Test1!
- PRINT USING "##.## seconds using concatenation"; Test2!
-
- Notice the extra step that synchronizes the start of each test to BASIC's
- TIMER function. As you probably know, the PC's system time is updated
- approximately 18 times per second. Therefore, it is possible that the test
- loop could begin just before the timer is about to be incremented. In that
- case the elapsed time would appear to be 1/18th second longer than the
- actual time. To avoid this potential inaccuracy, the DO loop waits until
- a new time period has just begun. There is still a similar accuracy loss
- at the end of the test when Done! is assigned from TIMER. But by
- synchronizing the start of the test, the error is limited to 1/18th second
- instead of twice that.
- When you compile and run this program using QuickBASIC 4.5, it will be
- apparent that the first test is more than three times faster than the
- second. However, with BASIC 7.1--using either near or far strings--the
- second is in fact slightly faster. Therefore, which is better depends on
- the version of your compiler, and there is no single best answer. Now
- let's compare code size.
- The disassemblies shown below are valid for both QuickBASIC 4.5 and
- BASIC 7.1. By counting bytes you can see that printing the strings using
- a semicolon generates 27 bytes, while first concatenating the strings
- requires 29 bytes.
-
-
- PRINT X$; Y$; Z$
- B83600 MOV AX,X$ ;get the address for X$
- 50 PUSH AX ;pass it on
- 9AD125FF4A CALL B$PSSD ;print with a semicolon
- B83A00 MOV AX,Y$ ;as above for Y$
- 50 PUSH AX
- 9AD125FF4A CALL B$PSSD
- B83E00 MOV AX,Z$
- 50 PUSH AX
- 9AD625FF4A CALL B$PESD ;print with end of line
-
- PRINT X$ + Y$ + Z$
- B83600 MOV AX,X$ ;get the address for X$
- 50 PUSH AX ;pass it on
- B83A00 MOV AX,Y$ ;get the address for Y$
- 50 PUSH AX ;pass that on too
- 9AD728FF4A CALL B$SCAT ;call String Concatenate
- 50 PUSH AX ;pass the combined result
- B83E00 MOV AX,Z$ ;get the address for Z$
- 50 PUSH AX ;pass it on
- 9AD728FF4A CALL B$SCAT ;combine that too
- 50 PUSH AX ;pass X$ + Y$ + Z$
- 9AD625FF4A CALL B$PESD ;print with end of line
-
-
- Even though the first example uses a single PRINT statement, BASIC treats
- it as three separate commands:
-
- PRINT X$;
- PRINT Y$;
- PRINT Z$
-
- The second example that concatenates the strings requires slightly more
- code because of the repeated calls to the B$SCAT (string concatenate)
- routine. Therefore, if you are using QuickBASIC it is clear that printing
- the strings separately is both smaller and faster. BASIC PDS users must
- decide between slightly faster performance, or slightly smaller code.
- These tests were repeated 1000 times to minimize the inaccuracies
- introduced by the timer's low resolution. Since this method of timing can
- be off by as much as 1/18th second (55 milliseconds), for test results to
- be accurate to 1% the test must take at least 5.5 seconds to complete.
- In most cases that much precision is not truly necessary, and other factors
- such as the time to use LOCATE will prevent absolute accuracy anyway.
- It is important that any timing tests you perform be done after
- compiling the program to an .EXE file. The BASIC editor is an interpreter,
- and is generally slower than a stand-alone program. Further, the reduction
- in speed is not consistent; some statements are nearly as fast as in a
- compiled program, and some are much slower.
- To obtain more accurate results than those shown here requires some
- heavy ammunition; I recommend the Source Profiler from Microsoft. This
- is a utility program that times procedure calls within a running program
- to an accuracy of one microsecond. The Source Profiler supports all
- Microsoft languages including QuickBASIC and BASIC PDS.
- To time a program you must compile and link it using the /zi and /co
- CodeView switches. This tells BASIC and LINK to add symbolic information
- that shows where variables and procedures are located, and also relates
- each logical line of source code to addresses in the .EXE file. The Source
- Profiler then uses this information to know where each source-language
- statement begins and ends.
- You should also understand that there's a certain amount of overhead
- associated with the timing loop itself. Any FOR/NEXT loop requires a
- certain amount of time just to increment the counter variable and compare
- it to the ending value. Fortunately, this overhead can be easily isolated,
- using an empty loop with the same number of iterations. The short complete
- program that follows shows this in context.
-
- Synch! = TIMER
- DO
- Start! = TIMER
- LOOP WHILE Start! = Synch!
-
- FOR X& = 1 TO 50000
- NEXT
-
- Done! = TIMER
- Empty! = Done! - Start!
- PRINT USING "##.## seconds for the empty loop"; Empty!
-
- Synch! = TIMER
- DO
- Start! = TIMER
- LOOP WHILE Start! = Synch!
-
- FOR X& = 1 TO 50000
- X! = -Y!
- NEXT
-
- Done! = TIMER
- Assign! = Done! - Start!
- PRINT USING "##.## seconds for the assignments"; Assign!
-
- Actual! = Assign! - Empty!
- PRINT USING "##.## seconds actually required"; Actual!
-
- SUMMARY
- =======
-
- In this chapter you learned a variety of programming shortcuts and other
- techniques. You saw firsthand how it is more efficient to avoid using CHR$
- and other BASIC functions repeatedly, in favor a single call ahead of time
- when possible. In a similar vein, you can reduce the size of your programs
- by consolidating multiple instances of UCASE$, MID$, LTRIM$, and other
- functions once before a series of IF tests, rather than use them each time
- for each test.
- You also learned that assigning multiple variables in succession from
- another often results in smaller code than assigning from the same numeric
- constant. Short circuit expression evaluation was described, and examples
- showed you how that technique can improve the efficiency of a QuickBASIC
- program. But since BASIC PDS already employs this optimization, multiple
- AND conditions are not needed when using that version of compiler.
- This chapter explained the importance of reducing the number of
- parameters you pass to a subprogram or function, and showed how you can use
- GOSUB to invoke a central handler that in turn calls the routine.
- Likewise, when using BASIC statements such as LOCATE, COLOR, and CLOSE that
- require additional arguments beyond those you specify, a substantial amount
- of code can be saved by creating a BASIC subprogram wrapper. Examples for
- turning the cursor on and off were shown, and these can save 13 and 15
- bytes per use respectively.
- Several programming techniques were shown, including a word wrap
- subprogram, a numeric rounding function, and a simple way to reboot the PC.
- You also learned how small amounts of data can be safely stored in the last
- 96 bytes of video memory, perhaps for use as a common data area between
- separately run [non-chained] programs.
- Finally, this chapter included a brief discussion of some of the issues
- surrounding benchmarking, and explained how to obtain reasonably accurate
- statement timings. To determine the size of the compiler-generated code
- requires disassembling with CodeView.
- Chapter 10 continues with a complete list of key addresses in low memory
- you are sure to find useful, and discusses each in depth along with
- accompanying examples.